refactor(ltm): redesign long-term memory with append-only incremental contexts by RC-CHN · Pull Request #8144 · AstrBotDevs/AstrBot

RC-CHN · 2026-05-11T01:38:08Z

Motivation

Fixes #8080
Rewrite the long-term memory (LTM) module from a ring buffer to an append-only architecture that keeps context prefixes stable across requests — enabling KV cache hits and the associated cost discounts (typically 1/10 of standard pricing across OpenAI, Anthropic, DeepSeek, and cloud providers).

Modifications / 改动点

Core: `astrbot/builtin_stars/astrbot/long_term_memory.py`

Replace max_cnt ring buffer with raw_records (deque) + _raw_cursor + contexts (append-only list). Old segments are never rebuilt.
_build_segments() converts raw chat lines into OpenAI-format context segments, handling tool calls, parallel tools, and multi-step chains.
<BOT/> markers replace [You/] to avoid nickname collisions.
on_agent_done records tool-call chains and now includes the @bot prompt in contexts so future rounds see the user's original message.
asyncio.Lock for concurrency safety; remove_session() for cleanup.

Hook wiring: `astrbot/builtin_stars/astrbot/main.py`

Swap @on_llm_response → @on_agent_done for accurate tool-chain recording.
Lazy toggle detection: false→true cleans stale state on next message.
group_icl_enable=true skips Conversation DB query (conversation=None).

Config: `astrbot/builtin_stars/astrbot/default.py`

Default context_limit_reached_strategy → "llm_compress".

Agent runner: `astrbot/core/astr_main_agent.py`

_get_compress_provider auto-falls back to the main chat provider when llm_compress_provider_id is unset, preventing silent truncation.

Tests: `tests/unit/test_long_term_memory.py` (new, 47 tests)

Pure functions: extract, parse, truncate, build_segments (31 tests).
Integration: round-trip lifecycle, multi-round accumulation, tool chains, persona preservation, concurrent safety (16 tests).
This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Tested on personal self-hosted astrbot.

Checklist / 检查清单

😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。
😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

Summary by Sourcery

Refactor the long-term memory subsystem to use an append-only, incremental context architecture and integrate it with agent completion hooks, while improving default compression behavior and regression coverage.

Enhancements:

Redesign group chat long-term memory to store raw messages and derived contexts in an append-only structure with concurrency-safe trimming and incremental segment building for LLM requests.
Update main agent wiring to build LLM contexts at request time, record full agent tool-call chains after completion, and lazily reset long-term memory state when toggling group ICL.
Change the default context limit reached strategy to use LLM-based compression instead of truncating by turns.
Allow the compression pipeline to fall back to the primary chat provider when no dedicated compression provider is configured or available.

Tests:

Add an extensive test suite for the new long-term memory implementation, covering parsing helpers, segment construction, multi-round accumulation, tool-chain recording, extreme inputs, persona interaction, and concurrency behavior.

Summary by Sourcery

Refactor group long-term memory to an append-only, incrementally built context model integrated with agent completion hooks, while tightening context compression behavior and isolating request-time context guarding from persistent history management.

Enhancements:

Redesign long-term memory to store raw group messages and derived LLM contexts in an append-only structure with configurable truncation and optional LLM-based summarization, including tool-call chains and bot replies.
Introduce a request-scoped context guard in the agent runner so per-request truncation/compression no longer mutates persistent conversation history, and adjust provider payload construction accordingly.
Adjust group ICL wiring to build contexts at request time, record agent results via the agent-done hook, and lazily reset LTM state when toggling group memory on or off.
Expand provider LTM and main agent configuration to support richer compaction controls, history tool-result truncation, raw record size limits, and improved default context compression strategy (LLM-based by default).

Tests:

Add an extensive unit test suite for the new long-term memory implementation covering parsing helpers, segment construction, multi-round accumulation, tool-chain recording, extreme inputs, persona interaction, and concurrency behavior.

…l contexts

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The MAX_* limits (MAX_MSGS_PER_USER_SEGMENT, MAX_CHARS_PER_USER_SEGMENT, MAX_RAW_BYTES) are currently hard-coded; consider wiring these through configuration (e.g., provider_ltm_settings) so different deployments or groups can tune memory usage and retention behavior without code changes.
In _trim_raw_records, total is recomputed by summing len(s.encode()) on every call, which is O(n); if this runs frequently on busy groups, consider tracking a running byte-size counter per umo to avoid repeatedly traversing the deque.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The MAX_* limits (MAX_MSGS_PER_USER_SEGMENT, MAX_CHARS_PER_USER_SEGMENT, MAX_RAW_BYTES) are currently hard-coded; consider wiring these through configuration (e.g., provider_ltm_settings) so different deployments or groups can tune memory usage and retention behavior without code changes.
- In _trim_raw_records, total is recomputed by summing len(s.encode()) on every call, which is O(n); if this runs frequently on busy groups, consider tracking a running byte-size counter per umo to avoid repeatedly traversing the deque.

## Individual Comments

### Comment 1
<location path="astrbot/builtin_stars/astrbot/long_term_memory.py" line_range="290-299" />
<code_context>
+    # 裁剪
+    # =========================================================================
+
+    def _trim_raw_records(self, umo: str) -> None:
+        """仅淘汰 cursor 之前的条目。cursor 之后的绝不碰（issue #2）。"""
+        dq = self.raw_records[umo]
+        cursor = self._raw_cursor[umo]
+
+        # 1. 无条件清除 cursor 之前的条目（已消费）
+        while dq and cursor > 0:
+            dq.popleft()
+            cursor -= 1
+        self._raw_cursor[umo] = cursor
+
+        # 2. 按大小继续从前面淘汰（限制极端情况的总内存）
+        total = sum(len(s.encode()) for s in dq)
+        while total > MAX_RAW_BYTES and dq and cursor > 0:
+            removed = dq.popleft()
+            total -= len(removed.encode())
</code_context>
<issue_to_address>
**issue (bug_risk):** Size-based trimming branch is effectively dead due to cursor reset logic.

In `_trim_raw_records`, the first loop always decrements `cursor` to 0 and then writes it back to `self._raw_cursor[umo]`. As a result, in the size-based loop `while total > MAX_RAW_BYTES and dq and cursor > 0:`, `cursor` is always 0 and the loop never runs, so `MAX_RAW_BYTES` is never enforced.

To preserve the intended behavior (always drop fully-consumed entries, and then optionally drop additional consumed entries to satisfy `MAX_RAW_BYTES`), you’ll need to decouple the notion of “consumed index” from the deque length. For example, track how many entries are removed in the first loop and use that to derive which entries are safe to drop in the size-based phase, rather than relying on `cursor > 0` after the first loop.
</issue_to_address>

### Comment 2
<location path="tests/unit/test_long_term_memory.py" line_range="207-216" />
<code_context>
+    def test_tool_call_then_result_then_bot(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for `_build_segments` when a tool result appears without a preceding tool call.

Current `_build_segments` tests only cover well-formed tool flows (`<T:CALL>` → `<T:RES>` → `<BOT>`). Please add a case where a `<T:RES>` appears without a prior `<T:CALL>`, e.g.:

```python
def test_tool_result_without_call_then_bot(self):
    lines = [
        "<T:RES id=orphan>data</T:RES>",
        "<BOT/14:30>: ok",
    ]
    result = _build_segments(lines)
    # assert behavior: either a valid tool segment or clean ignore, no exception,
    # and an intact assistant segment.
```

This helps ensure `_build_segments` behaves predictably with partial or inconsistent histories.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces Long Term Memory (LTM) v2, which significantly improves chatroom memory management by implementing incremental context building, support for tool-call history, and memory-efficient message tracking using deques and cursors. The changes also include a fallback mechanism for LLM compression and extensive unit tests. Feedback focuses on several critical areas: a potential memory leak in the contexts dictionary which is currently append-only, a logic error in the size-based trimming of raw records that renders some code unreachable, and the risk of KeyError crashes when parsing malformed tool-call records. Additionally, there is a discrepancy between the system prompt's description of bot message markers and the actual role-based formatting sent to the LLM.

gemini-code-assist · 2026-05-11T01:41:19Z

+        self.contexts: dict[str, list[dict]] = defaultdict(list)
+        """累积累积态 LLM 上下文。由 ContextManager 修改后保留。"""


The self.contexts dictionary is append-only and never pruned. In long-running sessions or active group chats, this will lead to a memory leak as the list of segments grows indefinitely. While append-only contexts help with KV cache hits, you should still implement a maximum context length (e.g., based on the provider's window or a safe segment count) to prevent unbounded memory growth.

gemini-code-assist · 2026-05-11T01:41:19Z

+        async with self._lock:
+            umo = event.unified_msg_origin

+            # 记录写入前索引 → on_req_llm 精确排除（issue #1, #9）
+            raw_idx = len(self.raw_records[umo])
+            event.set_extra("_ltm_raw_idx", raw_idx)


handle_message appends to raw_records but never triggers trimming. In groups that rarely interact with the bot, raw_records will grow indefinitely because _trim_raw_records is only called during an agent run. Trimming should be performed here (before calculating raw_idx) to ensure memory usage remains bounded. Note that since this logic is synchronous and does not contain 'await' calls, it is executed atomically in the asyncio event loop and does not require an explicit lock.

umo = event.unified_msg_origin self._trim_raw_records(umo) # 记录写入前索引 → on_req_llm 精确排除（issue #1, #9） raw_idx = len(self.raw_records[umo]) event.set_extra("_ltm_raw_idx", raw_idx)

References

In a single-threaded asyncio event loop, synchronous functions (code blocks without 'await') are executed atomically and will not be interrupted by other coroutines. Therefore, they are safe from race conditions when modifying shared state within that block.

…ol call re-persistence & add truncate tool

…_tool_call

…strategies exclusive

…records_max_bytes

…message_max_cn

…nt using $refs (AstrBotDevs#8186)

…8153) * chore: streamline convert_audio_to_opus logic - Route Opus conversion directly through the underlying convert_audio_format. - Remove redundant FFmpeg processing chains to improve code reusability. * perf: optimize AMR voice encoding parameters - Enhance AMR audio quality via built-in FFmpeg filters.

AstrBotDevs#8136) * fix: handle None tool arguments returned by Claude API for no-parameter tools * fix: handle None tool arguments from Claude API for no-parameter tools * fix: generalize None tool args comment * fix: generalize None tool args comment * 去除空格，以保证格式正确

* fix: add ollama and nvidia embedding * fix: address code review feedback for embedding providers - Remove redundant proxy branch in NvidiaEmbeddingProvider._get_client - Change ClientError handling to re-raise instead of wrapping in Exception - Add exc_info=True for better error diagnostics - Remove redundant isinstance check in OllamaEmbeddingProvider._build_payload

* fix: surface weixin media send failures * fix: include weixin send failure context * Delete tests/unit/test_weixin_oc_adapter.py --------- Co-authored-by: Weilong Liao <37870767+Soulter@users.noreply.github.com>

) * feat(lark): implement app registration and bot info retrieval - Add app registration functionality for Lark and Feishu platforms, including endpoints and request handling. - Introduce polling mechanism for app registration status. - Create bot info retrieval functionality to fetch bot details after successful registration. - Enhance dashboard with new UI components for one-click QR setup and manual setup options. - Update internationalization files to support new features and actions. - Add unit tests for app registration endpoint resolution and data handling. * feat(weixin_oc): add WeChat login registration and QR code handling

…avoid crashes on invalid or empty values * fix: add comments and await asyncio.sleep(0) for startup signal * fix: [Bug] 修复 MiniMax TTS 空字符串配置解析报错 * fix: 采纳AI审查建议，添日志+提取默认配置变量 * fix: 移除误加的core_lifecycle.py改动 --------- Co-authored-by: RainBot-Ai <qianlanzhiya@gmail.com>

…#8015) The WebUI only loaded Noto Sans SC (Simplified Chinese), which lacks Cyrillic glyphs. Russian text fell back to system sans-serif, causing poor rendering depending on the OS. Changes: - Load Noto Sans (regular) from Google Fonts alongside Noto Sans SC - Add 'Noto Sans' at the END of $cjk-sans-fallback (after CJK fonts) so Chinese text still renders with system CJK fonts first, while Cyrillic text falls through to Noto Sans. This ensures both Chinese and Cyrillic text render correctly.

…tDevs#8196)

…nism (AstrBotDevs#8198)

…ffmpeg failure (AstrBotDevs#8009) * fix: detect Tencent SILK (\x02 prefix) in audio magic bytes to avoid ffmpeg failure QQ official bot sends voice in Tencent SILK format (leading \x02 byte before #!SILK_V3 magic). _get_audio_magic_type() had two off-by-one slice errors: 1. Standard SILK: header[:8] vs b'#!SILK_V3' (8 != 9 bytes) — never matched 2. Tencent SILK: not detected at all Fixes: - Standard SILK: header[:9] == b'#!SILK_V3' (correct 9-byte slice) - Tencent SILK: header[:1] == b"\x02" and header[1:10] == b'#!SILK_V3' - ensure_wav() routes detected silk to tencent_silk_to_wav() Before: QQ voice → ffmpeg → 'Invalid data found' After: QQ voice → magic detects silk → tencent_silk_to_wav → WAV OK * refactor: use startswith() for SILK magic byte detection Replace manual slice comparisons with startswith() — cleaner, less error-prone, and immune to off-by-one slice errors. Suggested by: sourcery-ai

* fix(core): pass images through active replies * fix: harden active reply image collection * test: avoid logger coupling in active reply test * Delete tests/unit/test_builtin_astrbot_main.py --------- Co-authored-by: Weilong Liao <37870767+Soulter@users.noreply.github.com>

…xt (AstrBotDevs#8205) PR AstrBotDevs#8015 added 'Noto Sans' to the Google Fonts link and CJK fallback list, but the font was placed at the end of $cjk-sans-fallback where browsers never reach it for Cyrillic text. The global $body-font-family also lacked 'Outfit' entirely, causing Vuetify to use CJK fonts as the primary face. Changes: - Remove 'Noto Sans' from the end of $cjk-sans-fallback (it is not a CJK font) - Add 'Outfit' and 'Noto Sans' to $body-font-family before CJK fallbacks - Update .Outfit class in _container.scss to match the new stack This ensures: - Latin text → Outfit - Cyrillic text → Noto Sans (loaded by vite-plugin-webfont-dl) - CJK text → Noto Sans SC / PingFang SC etc. Fixes follow-up to AstrBotDevs#8015.

) * feat(lark): implement app registration and bot info retrieval - Add app registration functionality for Lark and Feishu platforms, including endpoints and request handling. - Introduce polling mechanism for app registration status. - Create bot info retrieval functionality to fetch bot details after successful registration. - Enhance dashboard with new UI components for one-click QR setup and manual setup options. - Update internationalization files to support new features and actions. - Add unit tests for app registration endpoint resolution and data handling. * feat(weixin_oc): add WeChat login registration and QR code handling

…#8015) The WebUI only loaded Noto Sans SC (Simplified Chinese), which lacks Cyrillic glyphs. Russian text fell back to system sans-serif, causing poor rendering depending on the OS. Changes: - Load Noto Sans (regular) from Google Fonts alongside Noto Sans SC - Add 'Noto Sans' at the END of $cjk-sans-fallback (after CJK fonts) so Chinese text still renders with system CJK fonts first, while Cyrillic text falls through to Noto Sans. This ensures both Chinese and Cyrillic text render correctly.

…nism (AstrBotDevs#8198)

…xt (AstrBotDevs#8205) PR AstrBotDevs#8015 added 'Noto Sans' to the Google Fonts link and CJK fallback list, but the font was placed at the end of $cjk-sans-fallback where browsers never reach it for Cyrillic text. The global $body-font-family also lacked 'Outfit' entirely, causing Vuetify to use CJK fonts as the primary face. Changes: - Remove 'Noto Sans' from the end of $cjk-sans-fallback (it is not a CJK font) - Add 'Outfit' and 'Noto Sans' to $body-font-family before CJK fallbacks - Update .Outfit class in _container.scss to match the new stack This ensures: - Latin text → Outfit - Cyrillic text → Noto Sans (loaded by vite-plugin-webfont-dl) - CJK text → Noto Sans SC / PingFang SC etc. Fixes follow-up to AstrBotDevs#8015.

RC-CHN added 4 commits May 9, 2026 15:25

refactor(ltm): redesign long-term memory with append-only incrementa…

c8437f8

…l contexts

test(ltm): add 37 unit tests for long-term memory v2 rewrite

d76e835

fix:@bot prompt inclusion in contexts

ba4b7c1

test(ltm): add 10 integration tests

590fb46

auto-assign Bot requested review from advent259141 and anka-afk May 11, 2026 01:38

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 11, 2026

sourcery-ai Bot reviewed May 11, 2026

View reviewed changes

Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated

Comment thread tests/unit/test_long_term_memory.py

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

style: format code

b7f5a73

RC-CHN marked this pull request as draft May 11, 2026 01:58

fix(long_term_memory): prevent exponential context inflation from to…

40bd637

…ol call re-persistence & add truncate tool

FFFold mentioned this pull request May 11, 2026

[Feature] 关于Astrbot设置里面的那些让大模型API资费猛增的"毒点" #8080

Open

2 tasks

RC-CHN added 2 commits May 11, 2026 16:51

fix(long_term_memory): dead code in _trim_raw_records size-based pruning

dce4d2b

fix(long_term_memory): correct CHATROOM_SYSTEM_NOTE and harden _parse…

ac5dfed

…_tool_call

RC-CHN mentioned this pull request May 11, 2026

fix: 降低 LLM 上下文调用成本风险 #8139

Open

5 tasks

RC-CHN added 12 commits May 11, 2026 17:11

style: format code

94e2c7c

feat: add truncate i18n

2b2c2d4

fix(agent): wrap async iterator next call for create_task typing

71d711c

refactor(agent): move conversation compaction to session layer, make …

b853d4d

…strategies exclusive

feat(long_term_memory): add LTM round compaction with two strategies

56909ab

feat(long_term_memory): expose MAX_RAW_BYTES as configurable ltm_raw_…

f0d1629

…records_max_bytes

test(long_term_memory): add LTM compaction tests

b547f3c

style: format code

19138b9

feat(long_term_memory): burst-drop compaction triggers, remove group_…

3a08305

…message_max_cn

docs(i18n): clarify ltm_raw_records_max_bytes hint across 3 locales

65b112d

fix(long_term_memory): guard against empty LLM summary response

d85d329

fix(long_term_memory): add cooldown for LLM summary retry on failure

c698b2d

M1LKT and others added 18 commits May 18, 2026 14:20

fix: synchronize the autoScroll state of the consoleDisplayer compone…

dfb9e95

…nt using $refs (AstrBotDevs#8186)

fix: surface weixin media send failures (AstrBotDevs#8175)

9073f52

* fix: surface weixin media send failures * fix: include weixin send failure context * Delete tests/unit/test_weixin_oc_adapter.py --------- Co-authored-by: Weilong Liao <37870767+Soulter@users.noreply.github.com>

feat(weixin_oc): handle session timeout and clear login state (AstrBo…

54613fd

…tDevs#8196)

fix: drop **kwargs bug in two register funcs (AstrBotDevs#8141)

176da6d

feat(dingtalk): implement one-click QR registration and polling mecha…

33b608f

…nism (AstrBotDevs#8198)

chore: bump version to 4.25.0

70a5c63

feat: add random suffix for weixin and dingtalk id

6051a36

chore: bump version to 4.25.1

2150951

docs: update release version instructions in AGENTS.md

1ff50ab

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels May 18, 2026

RC-CHN and others added 6 commits May 18, 2026 14:43

refactor(ltm): improve summary injection and lifecycle logging

1a32290

feat(dingtalk): implement one-click QR registration and polling mecha…

6144404

…nism (AstrBotDevs#8198)

feat: add random suffix for weixin and dingtalk id

fa18b11

RC-CHN closed this May 18, 2026

RC-CHN reopened this May 18, 2026

RC-CHN closed this May 18, 2026

RC-CHN mentioned this pull request May 18, 2026

refactor(ltm): redesign long-term memory with context compaction (reopen of #8144) #8226

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(ltm): redesign long-term memory with append-only incremental contexts#8144

refactor(ltm): redesign long-term memory with append-only incremental contexts#8144
RC-CHN wants to merge 55 commits into
AstrBotDevs:masterfrom
RC-CHN:refactor-ltm

RC-CHN commented May 11, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

		self.contexts: dict[str, list[dict]] = defaultdict(list)
		"""累积累积态 LLM 上下文。由 ContextManager 修改后保留。"""

Uh oh!

Conversation

RC-CHN commented May 11, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications / 改动点

Core: astrbot/builtin_stars/astrbot/long_term_memory.py

Hook wiring: astrbot/builtin_stars/astrbot/main.py

Config: astrbot/builtin_stars/astrbot/default.py

Agent runner: astrbot/core/astr_main_agent.py

Tests: tests/unit/test_long_term_memory.py (new, 47 tests)

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

RC-CHN commented May 11, 2026 •

edited by sourcery-ai Bot

Loading

Core: `astrbot/builtin_stars/astrbot/long_term_memory.py`

Hook wiring: `astrbot/builtin_stars/astrbot/main.py`

Config: `astrbot/builtin_stars/astrbot/default.py`

Agent runner: `astrbot/core/astr_main_agent.py`

Tests: `tests/unit/test_long_term_memory.py` (new, 47 tests)